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Abstract 

, Over the last 50 years a steady stream of accounts have been written on the separation principle of stochastic 

' control. Even in the context of the linear-quadratic regulator in continuous time with Gaussian white noise, subtle 

, difficulties arise, unexpected by many, that are often overlooked. In this paper we propose a new framework for 

establishing the separation principle. This approach takes the viewpoint that stochastic systems are well-defined 
, maps between sample paths rather than stochastic processes per se and allows us to extend the separation principle to 

^) ' systems driven by martingales with possible jumps. While the approach is more in line with "real-life" engineering 

, thinking where signals travel around the feedback loop, it is unconventional from a probabilistic point of view in 

•/^ ' that control laws for which the feedback equations are satisfied almost surely, and not deterministically for every 

sample path, are excluded. 

U: 

O ! I. Introduction 

. NE of the most fundamental principles of feedback theory is that the problems of optimal control 

^ ! V-^ and state estimation can be decoupled in certain cases. This is known as the separation principle. 
\ The concept was coined early on in [HTl . [|32l and is closely connected to the idea of certainty equivalence; 
• see, e.g., [38]. In studying the literature on the separation principle of stochastic control, one is struck 
^ . by the level of sophistication and technical complexity. The source of the difficulties can be traced to 
ly-j the circular dependence between control and observations. The goal of this paper is to present a rigorous 
O approach to the separation principle in continuous time which is rooted in the engineering view of systems 
as maps between signal spaces. 

The most basic setting begins with a linear system 
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dx = A{t)x{t)dt + Bi{t)u{t)dt + B2{t)dw 
dy = C{t)x{t)dt + D{t)dw 



with a state process x, an output process y and a control u, where w{t) is a vector-valued Wiener process, 
^ x(0) is a zero-mean Gaussian random vector independent of w{t), y{0) = 0, and A, Bi, B2, C, D are 
c3 . matrix-valued functions of compatible dimensions, which we take to be continuous of bounded variation. 
Moreover, DD' is nonsingular on the interval [0, T], and if we want the noise processes in the state and 
output equations to be independent, as often is assumed but not required here, we take B2D' = 0. All 
random variables and processes are defined over a common complete probability space J-", P). 
The control problem is to design an output feedback law 

TT : y u (2) 

over the window [0, T] which maps the observation process y to the control input u, in a nonanticipatory 
manner, so that the value of the functional 

J(m)=e|^ x{t)'Q{t)x{t)dt^ u{t)'R{t)u{t)dt + x{T)'Sx{T)^ (3) 
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is minimized, where Q and R are continuous matrix functions of bounded variation, Q{t) is positive 
semi-definite and R{t) is positive definite for all t. How to choose the admissible class of control laws tt 
has been the subject of much discussion in the literature Il27ll . The conclusion, under varying conditions, 
has been that tt can be chosen to be linear in the data and, more specifically, in the form 

u{t)=K{t)x{t), (4) 

where x{t) is the Kalman estimate of the state vector x{t) obtained from the Kalman filter 

dx = A{t)x{t)dt + Bi{t)u{t)dt + L{t){dy - C{t)x{t)dt), x(0) = 0, (5) 

and the gains K and L computed by solving to a pair of dual Riccati equations. 

A result of this kind is far from obvious, and the early literature was marred by treatments of the 
separation principle where the non-Gaussian element introduced by an a priori nonlinear control law tt 
was overlooked. The subtlety lies in excluding the possibility that a nonlinear controller extracts more 
information from the data than it is otherwise possible. This point will be explained in detail in Section |Ill 
where a brief historical account of the problem will be given. Early expositions of the separation principle 
often fall in one of two categories: either the subtle issues are overlooked and inadmissible shortcuts are 
taken; or the treatment is mathematically quite sophisticated and technically very demanding. The short 
survey in Section HI] will thus serve the purpose of introducing the theoretical challenges at hand, as well 
as setting up notation. 

In this paper we take the point of view that feedback laws © should act on sample paths of the 
stochastic process y rather than on the process itself. This is motivated by engineering thinking where 
systems and feedback loops process signals. Thus, our key assumption on admissible control laws ^ is 
that the resulting feedback loop is deterministically well-posed in the sense that the feedback equations 
admit a unique solution path-wise which causally depends on the input. For this class of control laws we 
prove that the separation principle stated above holds and moreover that it extends to systems driven by 
general martingale noise. More precisely, in this non-Gaussian situation the Wiener process w in ([T]) is 
replaced by an arbitrary martingale process with possible jumps such as a Poisson process martingale; 
see, e.g., fW, p. 87]. Then, we only need to exchange the (linear) Kalman estimate x by the strict sense 
conditional mean 

x{t) = E{x{t) I y^}, (6) 

where 

yt:=a{y{r),Te[0,t]}, 0<t<T, (7) 

is the. filtration generated by the output process; i.e., the family of increasing sigma fields representing 
the data as it is produced. The estimate x needs to be defined with care so that it constitutes a sufficiently 
regular stochastic process and realized by a map acting on observations [2", page 17], fTT]. Unfortunately, 
the results in the present paper come at a cost since our key assumption of well-posedness excludes 
control laws for which the feedback system fails to be defined sample-wise. Existence of strong solutions 
of the feedback equations is not enough to ensure well-posedness in our sense as we will discuss below. 
In addition, the condition of deterministic well-posedness is often difficult to verify. Yet, besides the fact 
that we prove the separation principle for general martingale noise, the sample-wise viewpoint provides 
a simple explanation of why the separation principle may hold in the first place. 

Before proceeding we recast the system model ([T]) in an integrated form which allows similar conclusions 
for more general linear systems in a unified setting. To this end, let 




System ([T]) can now be expressed in the form 

z{t) = zo{t) + G(t,r)M(r)rfr 
y{t) = Hzit), 
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where zq is the process z obtained by setting u = and G is a Vokerra kernel. This integrated form 
encompasses a considerably wider class of controlled linear systems which includes delay-differential 
equations, following [|26ll . llTTIl . which will be taken up in Section |VIl The corresponding feedback 
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Fig. 1. A feedback interconnection. 

configuration is shown in Figure [T] where g represents the Volterra operator 

9 : {t,u)^ [ G{t,T)u{r)dT, (9) 

and H is a constant matrix. As usual, Figure [His a graphical representation of the algebraic relationship 

z = zq + gnHz. (10) 

For the particular model in dD, H = [0, /], but in general H could be any matrix or linear system. Setting 
z := X and H = I we obtain the special case of complete state information. 

In a stochastic setting, the feedback equation (flOl) is said to have a unique strong solution if there 
exists a non-anticipating function F such that z = F{zo) satisfies (flOl) with probability one and all other 
solutions coincide with z with probability one. It is important to note that in our sample-wise setting we 
require more, namely that such a unique solution exists and that (flOl ) holds for all zq, not only "almost 
all." Consequences of this requirement will be further elaborated upon below. 

The outline of the paper is as follows. In Section |ll] we begin by reviewing the standard quadratic 
regulator problem and pointing our the subtleties created by possible nonlinearities in the control law. We 
then review several strategies in the literature to establish a separation principle, chiefly restricting the 
class of admissible controls. Section UlI] defines notions of signals and systems used in our framework, 
and in Section |IV] we establish necessary conditions for a feedback loop to make sense and deduce a basic 
fact about propagation of information in the loop through linear components. It Section |V] we state and 
prove our main results on the separation principle for linear-quadric regulator problems, allowing also for 
more general martingale noise. Finally, in Section |Vl] we prove a separation theorem for delay systems 
with Gaussian martingale noise. 

II. Historical remarks 

A common approach to establishing the basic separation principle stated at the beginning of Section 
in is a completion-of-squares argument similar to the one used in deterministic linear-quadratic-regulator 
theory; see e.g. [jT]. For ease of reference, we briefly review this contruction. Given the system ([T) and 
the solution of the matrix Riccati equation 

P = -A'P -PA + PBiR~^B[P - Q, P(T) = S. (11a) 

Ito's differential rule (see, e.g., [[T9l , [[311 ) yields 

d{x'Px) = x'Pxdt + 2x'Pdx + ti{B'^PB2)dt, 
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where tr(M) denotes the trace of the matrix M. Then from ([T]) and (II lal ) it readily follows that 

d{x'Px) = [-x'Qx - u'Ru + {u- Kx)'R{u - Kx)]dt + ti{B2PB2)dt + 2x'PB2dw, 

where 

K{t) := -R{t)~^Bi{tyP{t). (lib) 

Integrating this from to T and taking mathematical expectation, we obtain the following expression for 
the cost functional ©: 



J{u) 



= E |x(0)'P(0)x(0) + j^{u- Kx)'R{u - Kx)dt^ + ^ ii{B'^PB2)dt. (12) 

To ensure that x'PB2dw has zero expectation, we need to check that the integrand is square integrable 
almost surely. It is clear that u is square integrable for otherwise J{u) = oo. Then the state process 

x{t)=Xo{t)+[ ^{t,s)Bi{s)u{s)ds (13) 
Jo 

is square integrable as well. Here xq is the (square integrable) state process corresponding to u = 0, and 
s) is the transition matrix function of the system ([1). 
Now, if we had complete state information with ([T]) replaced by 

dx = A{t)x{t)dt + Bi{t)u{t)dt + B2{t)dw ^^^^ 
y = x 

we could immediately conclude that the feedback law 

u{t) = K{t)x{t) (15) 



is optimal, because the last term in (|T2l) does not depend on the control. However, when we have incomplete 
state information with the control being a function of the observed process {y(s);0 < s < t], things 
become more complicated. Mathematically we formalize this by having any control process adapted to 
the filtration (|7]); i.e., having u{t) J^t-measurable for each t G [0,r]. Then, setting 

x{t) := x{t) - x{t) (16) 

with X given by we have E{[u{t) — K{t)x{t)\x{t)'} = 0, and therefore 

E [ {u- KxyR{u- Kx)dt = E [ [{u - Kx)' R{u - Kx) + ii{K' RKY)]dt, (17) 
Jo Jo 

where S is the covariance matrix 

S(t) := E{x(t)x(t)'}. (18) 

A common mistake in the early literature on the separation principle is to assume without further 
investigation that S does not depend on the choice of control. Indeed, if this were the case, it would 
follow directly that (fT2] ) is minimized by choosing the control as (|4l), and the proof of the separation 
principle would be immediate. (Of course, in the end this will be the case under suitable conditions, but 
this has to be proven.) This mistake probably originates from the observation that the control term in (flJl 
cancels when forming (fT6] l so that 

i(t) = Xo(t) :=Xo(t)-a;o(t), (19) 

where 

Xo{t) := E{x^{t) I yt). (20) 
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However, in this analysis, we have not ruled out that Xq depends on the control or, what would follow 
from this, that the filtration (|7]) does. A detailed discussion of this conundrum can be found in [|27l . In 
fact, since the control process u is in general a nonlinear function of the data and thus non-Gaussian, then 
so is the output process Consequently, the conditional expectation (l20l) might not in general coincide 
with the wide sense conditional expectation obtained by projections of the components of Xo(t) onto the 
closed linear span of the components of {y(r),r G [0,t]}, and therefore, a priori, it could happen that x 
is not generated by the Kalman filter ([5]). 

To avoid these problems one might begin by uncoupling the feedback loop as described in Figure 2, 
and determine an optimal control process in the class of stochastic processes u that are adapted to the 
family of sigma fields 

3^°:=(T{yo(r),rG [0,t]}, < t < T, (21) 

i.e., such that, for each t e [0,T], u{t) is a function of 2/o(s); < s < T. Such a problem, where one 
optimizes over the class of all control processes adapted to a fixed filtration, was called a stochastic open 
loop (SOL) problem in ll27l . In the literature on the separation principle it is not uncommon to assume 
from the outset that the control is adapted to {3^°}; see, e.g., (61 Section 2.3], lfT6ll , [|40ll . 
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Fig. 2. A stochastic open loop (SOL) configuration. 



In [|27l it was suggested how to embed the class of admissible controls in various SOL classes in a 
problem-dependent manner, and then construct the corresponding feedback law. More precisely, in the 
present context, the class of admissible feedback laws was taken to consist of the nonanticipatory functions 
u := n(y) such that the feedback loop 

z = zq + girHz (22) 

has a unique solution 2^ and u = tt(Hzt,) is adapted to {yf}. Next, we shall give a few examples of 
specific classes of feedback laws that belong to this general class. 

Example 1: It is common to restrict the admissible class of control laws to contain only linear ones; 
see, e.g., fT2]. In a more general direction, let C be the class 

(£) u{t) = uoit)+ [ F{t,T)dy, (23) 

where n is a deterministic function and F is an L2 kernel. In this way, the Gaussian property will be 
preserved, and x will be generated by the Kalman filter ©. Then it follows from ([T]) and ([5]) that x is 
generated by 

dx={A- LC)xdt + {B2 - LD)dw, x(0) = x(0), 

which is clearly independent of the choice of control. Then so is the error covariance (flST l. as desired. 
Even in the more general setting described by (l8), it was shown in \{26i pp. 95-96] that 

yt = y?, te[o,T], (24) 

for any n E C, where (1211 is the filtration generated by the uncontrolled output process obtained by 
setting n = in dSl). 



'However, the model is conditionally Gaussian given the filtration {Vt}', see Remark[6] 
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Example 2: In his influential paper [1411 . Wonham proposed the class of control laws 

u{t) = ^{t,x{t)) (25) 

in terms of the state estimate where ip{t,x) is Lipschitz continuous in x. For pedagogical reasons, 
we first highlight a somewhat more restrictive construction due to Kushner [i21il . Let 

Ut) := E{xo{t) \ y^} 

be the Kalman state estimate of the uncontrolled system 

dxo = A{t)xo{t)dt + B2{t)dw 
dyo = C{t)xo{t)dt + D{t)dw 

Here we use the notation to distinguish it from xq, defined by (|20l ). which a priori depends on the 
control. Then the Kalman filter takes the form 

dio = AUt)dt + L{t)dvo, eo(0) = 

where the innovation process 

dvQ = dyo - Cio{t)dt, Vo{Q) = 

generates the same filtration, {V°}, as yo', i.e., V° = for t E [0,T]. This is well-known, but a simple 
proof is given on page [18] in Section |Vl] in a more general setting; see (|651) . Now, along the lines of (fT3l) , 
define ^ 



i{t) = io{t) + [ s)Bi{s)u{s)ds, 
Jo 



where the control is chosen as 

u{t) = i,{t,i{t)). ill) 

Since if) is Lipschitz, ^ is the unique strong solution of the stochastic differential equation 

di = {Aiit) + Bi^l){t, i{t)))dt + L{t)dvQ, = 0, (28) 

and it is thus adapted to {V°} and hence to {3^°}; see, e.g., [|T9l p. 120]. Hence the selection (1271 ) of 
control law forces u to be adapted to {3^f }, and hence, due to 

dy = dyo+ [ C{t)<^{t,s)Bi{s)u{s)dsdt, (29) 
Jo 

obtained from (fT3] ). yt C y^ for t E [0,T]. However, since the control-dependent terms cancel, 

dvo = dyo — C^o(t)dt = dy — C^(t)dt, 

which inserted into (|28T l yields a stochastic differential equation, obeying the appropriate Lipschitz condi- 
tion, driven by dy and having ,^ as a strong solution. Therefore, ^ is adapted to {yt}, and hence, by (|T7l) . 
so is u. Consequently, dlH) implies that yf C yt for t E [0, T] so that actually ^) holds. Finally, this 
implies that ^ = x, and thus u is given by (l25l) . However, it should be noted that the class of control laws 
(1271 ) is a subclass of (|25] ) as it has been constructed to make u a priori adapted to {3^°}. Therefore, the 
relevance of these results, presented in [1211 . for the proof in [|22l page 348] is unclear. In their popular 
textbook (7^1, widely used as a reference source for the validity of the separation principle over a general 
class of admissible (including nonlinear) controls, Kwakemaak and Sivan prove the separation principle 
over a class of linear laws but claim with reference to [|22l . [|2T][ that it holds "without qualification" in 
general [l20l p. 390]. (However, see Remark |6] below.) 

In his pioneering paper [HTI Wonham proved the separation theorem for controls in the class (|25t even 
with a more general cost functional than (|3]). However, the proof is far from simple and marred by many 
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technical assumptions. A case in point is the assumption that C(t) is square and has a determinant bounded 
away from zero, which is a serious restriction. A later proof by Fleming and Rishel IfTSl is considerably 
simpler. They also prove the separation theorem with quadratic cost functional ([3]) for a class of Lipschitz 
continuous feedback laws, namely 

u{t)=(P{t,y), (30) 

where : [0, T] x C"[0, T] — ;> M™ is a nonanticipatory function of y which is Lipschitz continuous in 
this argument. 

Example 3: It is interesting to note that if there is a delay in the processing of the observed data so 
that, for each t, u{t) is a function of y(r); < r < t — e, then 

yt = yl tG[o,T]. 

To see this, let n be a positive integer, and suppose that = for t G [0,n5]. Since u{t) is 3^t_£- 
measurable on [0, (n + 1)5], it is at the same time yt-e as well as 3^°_g-measurable. Then, since 

yit) = y^{t)+ [ HG{t,s)u{s)ds, 
Jo 

it follows that yt = y^ for t E [0, {n + l)e]. Since yt = 3^° for t E [0, e], ^ follows by induction. 

Remark 4: This is the reason why the problem with possibly control-dependent sigma fields does not 
occur in the usual discrete-time formulation. Indeed, in this setting, the error covariance ([TST l will not 
depend on the control, while, as we have mentioned, some more analysis is needed to rule out that its 
continuous-time counterpart does. This invalidates a procedure used in several textbooks (see, e.g., [|36l ) in 
which the continuous-time S is constructed as the limit of finite difference quotients of the discrete-time 
S, which, as we have seen in Example [3l does not depend on the control, and which simply is the solution 
of a discrete-time matrix Riccati equation. However, we cannot a priori conclude that continuous-time 
S satisfies this Riccati equation. For this we need (|24|) . or alternatively arguments such as in Remark |6l 
Otherwise the argument is circular. 

Remark 5: Historically, a popular approach was introduced in Duncan and Varaiya, and Davis and 
Varaiya [[T4l . [[T3l based on weak solutions of the relevant stochastic differential equation. The driving 
noise is Wiener and the approach utilizes the Girsanov transformation to recast the problem in a way so 
that the filtration of the observation process is independent of the input process (see [!6l Section 2.4]). 
Very briefly, by an appropriate change of probability measure, 

dw = Biudt + B2dw 

can be transformed into a new Wiener process, which in the sense of weak solutions (T9\ is the same 
as any other Wiener process. In this way, the filtration {yt} can be fixed to be constant with respect to 
variations in the control. In this paper we do not consider weak solutions since our observation process 
is not arbitrary from an applications point of view. 

Remark 6: Yet another approach to the separation principle is based on the fact that, although ([T]) with 
a nonlinear control is non-Gaussian, the model is conditionally Gaussian given the filtration {X} [l29l 
Chapters 16.1]. This fact can be used to show that x is actually generated by a Kalman filter [29, Chapters 
11 and 12]. This last approach requires quite a sophisticated analysis and is restricted to the case where 
the driving noise is a Wiener process. 

A key point for establishing the separation priniciple is to identify admissible control laws for which 
(|24|) holds. For each such control law vr we need a solution of the feedback equation (fTOl) . i.e., a pair 
{zq, z) of stochastic processes that satisfies 

z = Zq + gnHz. (31) 

Since Zq is the driving process, it is natural to seek a solution z which causally depends on zq and is 
unique. If this is the case then 2; is a strong solution; otherwise it is a weak solution. There are well- 
known examples of stochastic differential equations that have only weak solutions (19, page 137], [[37l . 
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||5l . Moreover, as we have mentioned in Remark |5l weak solutions circumvent the need to establish the 
equivalence (l24l) between filtrations. Thus, it has been suggested that the framework of weak solutions is 
the appropriate one for control problems [04] page 149]. Yet, from an applications point of view, where 
the control needs to be causally dependent on observed data, this is in our view questionable. In fact, in 
the present paper we take an even more stringent view on the causal dependence. We require that (|3T] ) 
has a unique strong solution which specifies a measurable map zq ^ z between sample-paths (cf. |fT9] 
Remark 5.2, p. 128], [f34| p. 122]), thus modeling correspondence between signals - we further elaborate 
upon this in Section |IVj 

In short, we only allow control laws which are physically realizable in an engineering sense, in that they 
induce a signal that travels through the feedback loop. This comes at a price since there are stochastic 
differential equations having strong solutions that do not fall in this category (Remark [T2]). Moreover, 
verifying that a control law is admissible in our sense may be difficult to ascertain in general. On the 
other hand, an advantage of the approach is that the class of control laws includes discontinuous ones and 
allows for statements about linear systems driven by non-Gaussian noise with possible jumps. We now 
proceed to develop the approach and the key property of deterministic well-posedness. 



Signals are thought of as sample paths of a stochastic process with possible discontinuities. This is quite 
natural from several points of view. First, it encompasses the response of a typical nonlinear operation that 
involves thresholding and switching, and second, it includes sample paths of counting processes and other 
martingales. More specifically we consider signals to belong to the Skorohod space D; this is defined as 
the space of functions which are continuous on the right and have a left limit at all points, i.e., the space 
of cddldg functions H It contains the space C of continuous functions as a proper subspace. The notation 
D[0,T] or C[0,T] emphasizes the time interval where signals are being considered. 

Traditionally, the comparison of two continuous functions in the uniform topology relates to how much 
their graphs need to be perturbed so as to be carried onto one another by changing only the ordinates, 
with the time-abscissa being kept fixed. However, in order to metrize D in a natural manner one must 
recognize the effect of uncertainty in measuring time and allow a respective deformation of the time axis 
as well. To this end, let K, denote the class of strictly increasing, continuous mappings of [0, T] onto itself 
and let / denote the identity map. Then, for x,y E D[0, T], 



defines a metric on D[0, T] which induces the so called Skorohod topology. A further refinement so as to 
ensure bounds on the slopes of the chords of n, renders D[0,T] separable and complete, that is, D[0,T] 
is a Polish space; see E Theorem 12.2]. 

Systems are thought of as general measurable nonanticipatory maps from D ^ D sending sample paths 
to sample paths so that their outputs at any given time t is a measurable function of past values of the 
input and of time. More precisely, let 



An important class of systems is provided by stochastic differential equations with Lipschitz coefficients 
driven by a Wiener process [134] Theorem 13.1]. These have path- wise unique strong solutions. Strong 
solutions induce maps between corresponding path spaces [34, page 127], fl9[ pages 126-128]. Also, 



III. Signals and systems 





Then, a measurable map / : D[0,T] D[0,T] is said to be a system if and only if 

n,/n, = n,/ for aii r g [o,r]. 



^"continu a droite, limite a gauche" in French, alternatively RCLL ("right continuous with left limits") in English. 
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under fairly general conditions (see e.g., [l33l Chapter V]), stochastic differential equations driven by 
martingales with sample paths in D have strong solutions who are semi-martingales. 

Besides stochastic differential equations in general, and those in ^ in particular, other nonlinear maps 
may serve as systems. For instance, discontinuous hystereses nonlinearities as well as non-Lipschitz static 
maps such as u i-^ y := \/\u\, are reasonable as systems, from an engineering viewpoint. Indeed, these 
induce maps from D ^ D (or from C — )■ -D, as in the case of relay hysteresis), are seen to be systems 
according to our definitionJl and can be considered as components of nonlinear feedback laws. We note 
that a nonlinearity such as u t-^ y = sign(u) is not a system in the sense of our definition since the 
output is not in general in D. Such nonlinearities, which often appear in bang-bang control, need to be 
approximated with a physically realizable hysteretic system. 



IV. Well-posedness and a key lemma 

It is straightforward to construct examples of deterministically well-posed feedback interconnections 
with elements as above. However, the situation is a bit more delicate when considering feedback loops 
since it is also perfectly possible that, at least mathematically, they give rise to unrealistic behavior. A 
standard example is that of a feedback loop with causal components that "implements" a perfect predictor. 
Indeed, consider a system / which superimposes its input with a delayed version of it, i.e., 

/ : H- + 2(t - tdclay), 

for t > 0, and assume initial conditions z{t) = for t < . Then the feedback interconnection of Figure 
[3] is unrealistic as it behaves as a perfect predictor. The feedback equation 

Z{t) = Zo{t) + f{z{t)) = Zo{t) + Z{t) + Z{t - tdclay) 

gives rise to = zo{t) + z{t — tdciay), and hence, 

z(t) = —Zo(t + tdclay)- 

Therefore, the output process z is not causally dependent on the input. The question of well-posedness 
of feedback systems has been studied from different angles for over forty years. See for instance the 
monograph by Jan Willems ll39l . 

In our present setting of stochastic control we need a concept of well-posedness which ensures that 
signals inside a feedback loop are causally dependent on external inputs. This is a natural assumption 
from a systems point of view. 

Definition 7: A feedback system is deterministically well-posed if the closed-loop maps are themselves 
systems; i.e., the feedback equation z = zq + f{z) has a unique solution z for inputs zq and the operator 
(1 — /)~^ is itself a system. 



f 



Fig. 3. Basic feedback system. 

^More precisely, to be seen as a system, relay liysteresis needs to be preceded by a low-pass filter since its domain consists of continuous 
functions. 



10 



Thus, now thinking about zq and z in the feedback system in Figure |3] as stochastic processes, 
deterministic well-posedness implies that Zt C for t E [0,T], where Zt and Z^ are the sigma- 
fields generated by z and zq, respectively. This is a consequence of the fact that (1 — f)~^ is a system. 
Likewise, since (1 — /) is also a system, Z^ C Zt so that in fact 

z^ = Zt, te[o,r]. (32) 

Next we consider the situation in Figure [T| and the relation between yt and the filtration of the 
process yo = Hzq. The latter represents the "uncontrolled" output process where the control law tt is 
taken to be identically zero. A key technical lemma for what follows states that the filtrations yt and 
y^ are also identical if the feedback system is deterministically well-posed. This is not obvious at first 
sight, solely on the basis of the linear relationships y = Hz and yo = Hzq, as the following simple 
example demonstrates: the two vector processes (q) and (°) generate the same filtrations while (1 0)(g) 
and (1 0)(°) do not. 

Lemma 8: If the feedback interconnection in Figure \T\ is deterministically well-posed, gn is a system, 
and H is a linear system having a right inverse that is also a system, then (1 — Hgix)^^ is a system 

andyt = yl te[o,T]. 

Remark 9: Note that, for the prototype problem involving ([1]), the conditions on H in Lemma [8] are 
trivial as if = [O /] and hence H^^ := H' is a right inverse. The requirement in the lemma that gn is 
a system allows for a more general situation where vr is not itself a system (e.g., generating outputs not 
in D), but where the cascade connection is still admissible. 

Proof: By well-posedness (1 —girH)^^ is a system. To show that {1 — H g7r)~^ exists and is a system, 
first note that 

{1- HgiT)H = H - HgnH = H{1- gnH). (33) 
The first step is using left distributivity and the second is using the fact that H is linear. But then 

{1- Hg7i)H{l- gnHy^H-^ = I, (34) 



where HH ^ = I. Thus, h is a "right inverse" of p := (1 — Hgn) in that the composition p o h of the 
two maps is the identity. We claim that h is in fact the inverse of p (which is necessarily unique) in that 

y = h{yo) and 

(1 - Hgn)y = yo (35) 

establish a bijective correspondence between y and yo, i.e., that both p o h as well as h o p are identity 
maps. We need to show the latter. The only potential problem would be if two distinct values y and y 
satisfy (l35l) for the same value for yo- We now show that this is not possible. 

Since H is right invertible, yo can be written in the form yo = Hzq for zq = H~^yo. Let z = 
(1 — g7rH)~^zo and y = Hz. Then y = h{yo), so by (|34l ) y is a particular solution of equation (|35T ). Now 
let y be another solution, i.e., suppose that 

(1 - Hg7r)y = yo (36) 

and that y ^ y. We begin by writing y in the form y = Hz, which can always be done since H is right 
invertible. Next we set zq := (1 — g7rH)z. Then, by well-posedness, z is the unique solution of 

z = Zo + gnH{z). (37) 

Moreover, by (l33l) and (1361 ). Hzo = yo, and consequently zo = zo + v with Hv = 0. We now claim 
that z = z + V which would then contradict the assumption that y ^ y. To show this, note that, since 
z = zo + gTiHz and H is linear. 



z + V = zo + V + gnH{z + v). 
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But the solution to (1371 ) is unique by well-posedness. Hence, z = z + v which proves our claim. 
Therefore, finally, (1 — Hgn) is invertible and 

(1 - Hgn)-^ = h = H{l- g-nUy^H^^ 

is itself is a system, being a composition of systems. Thus, the configuration in Fig. |4]is deterministically 
well-posed. Using (l33l) once again, 

H{1- gnHy^ = {1- Hgny^H. (38) 

It now follows that 

y = H{l- gnH)-ho = (1 - Hgn)-'Hzo = (1 - Hgn)-'yo, (39) 

while also holds. Equation dlH) shows that yt C y^, whereas (l35]) shows that y^ C yt. ■ 
The essence of the lemmc0 is to underscore the equivalence between the configuration in Figure [Hand 
that in Figure [U It is this equivalence which accounts for the identity yt = yt between the respective 
(T-algebras. An analogous notion of well-posedness was considered by Willems in [40J where however, 
in contrast, the well-posedness of the feedback configuration in Figure [4l and consequently the validity 
of yt = 3^°, is assumed at the outset. 



H 



u 



H 



-I- 



TT 



Fig. 4. An equivalent feedback configuration. 



In the present paper we consider only feedback laws that render the feedback system deterministically 
well-posed. Therefore we highlight the conditions in a formal definition. 

Definition 10: A feedback law tt is deterministically well-posed for the system ([8]) if gir is a system 
and the feedback loop of Figure \T\ is deterministically well-posed. 

If the feedback law tt is deterministically well-posed, then, by Lemma [8] the feedback loop in Figure [4] 
is also deterministically well-posed. Thus, in essence, given the assumption that z = zq + girHz admits 
a pathwise unique strong solution, so does y = yo + Hgny. 

Remark 11: For pedagogical reasons, we consider the case of complete state information, corresponding 
to ([141) . This corresponds to taking H = I and z = x, and the basic feedback loop is as depicted in Figure [5] 
Then the basic condition for well-posedness ([32] ) states that the filtration {Xt}, where Xt := a{x{s); s E 
[0,T]}, is constant under variations of the control. Consequently, we do not need Lemma [8] to resolve an 
issue of circular control dependence. This is completely consistent with the analysis leading up to ([T5] ) 
in Section [HI 

'*It is interesting to note, as was pointed out by a referee, that the proof of the lemma relies critically on the action of the operator 
(1 — gTvH)^^ on a null set, as the probability ¥{zo = H^^yo) — for any nontrivial model. This fact may be disturbing from a 
probabilistic point of view but does not invalidate the lemma. 
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Xq 




f 



Fig. 5. Feedback loop for complete state infoimation. 

Remark 12: We now present an example of a feedback system which fails to be deterministically 
well-posecH. Consider the system 

dx = udt + dw 
y = x 

where is a Wiener process. Then, the control law u = 7r{y) with 7r{y) = max{|xp/^,l} is not 
deterministically well-posed although the stochastic differential equation 

dx = Ti{x)dt + dw 

has a unique strong solution fflSl Chapter 5, Proposition 5.17] in the sense that any other solution has 
same sample paths with probability one (indistinguishable). The failure to be deterministically well-posed 
can be traced to the fact that this control law allows for multiple consistent responses for w = 0, a 
physically questionable situation. Indeed, the ordinary differential equation x = tx{x) is not Lipschitz and 
has infinitely many solutions. 

V. The separation principle 

Our first result is a very general separation theorem for the classical stochastic control problem stated 
at the beginning of Section HI 

Theorem 13: Given the system ([T]), consider the problem of minimizing the functional dS]) over the 
class of all feedback laws vr that are deterministically well-posed for ([1}. Then the unique optimal control 
law is given by (Hj), where K is defined by (fTTI) . and x is given by the Kalman filter ([5]). 

Proof: By Lemma [8l (fTSi) does not depend on the control. Therefore, given the analysis at the 
beginning of Section |Ill (H)) is the unique optimal control provided it defines a deterministically well- 
posed control law. It remains to show this. 

Inserting into © yields 

xit)= I ^!{t,s)L{s)dy{s), 
Jo 

where the transition matrix s) of [A(t) + Bi(t)K{t) — L{t)C{t)] has partial derivatives in both 
arguments. Together with (H) this yields 

u{t) = {7r,p,y){t):= [ M{t,s)dy{s), (40) 

where M(t,s) := K(t)'^{t, s)L{s). Clearly s h-> M{t,s) has bounded variation for each t e [0,T], and 
therefore integration by parts yields 

{TToptym = M{t,t)y{t) - [ dsM{t,s)y{s)ds, (41) 

Jo 



'This example was kindly suggested by a referee. 
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which is defined samplewise. Now inserting u = iToptHz into ^ and (flOl ) we obtain 

z = zo + giToptHz, (42) 

where gix^ptHz takes the form 

{g7roptHz){t) = [ N{t,s)dz{s) with N{t,s)= [ G{t,T)M{T, s)HdT, 

Jo Js 

where G is the kernel of the Volterra operator (|9). A simple calulation yields 



Ait) 
Cit) 



where s) is the transition matrix of A, and therefore Q{t,s) := ^{t,s) is a continuous Volterra 
kernel, and so is the unique solution R of the resolvent equation 

R{t, s) = R{t, t)Q{t, s)dT + g(t, s) (43) 

[135]| . [|42]| . From ([42]) we have ^ 

c?^ = dzQ + / s)dz{s)dt 
Jo 



from which it follows that ^ ^ 



s)ciz(s) = / R{t,s)dzo{s). 
Jo 

Consequently, (1 — g-KoptH) has a unique preimage given by 

[(1 - g7ioptHy^z]{t) = zo{t) + [ [ R{t, s)dsdzo{T), 



which is clearly a system, as claimed. Hence the feedback loop is deterministically well-posed. ■ 
Consequently, for a system driven by a Wiener process with Gaussian initial condition, the linear control 
law defined by @) and ([5]) is optimal in the class of all linear and nonlinear control laws for which the 
feedback system is deterministically well-posed. 

If we forsake the requirement that x is given by the Kalman filter ([5]), we can now allow Xq to be 
non-Gaussian and w to be an arbitrary martingale, even allowing jumps. 

Theorem 14: Given the system ©, where w is a martingale and a;(0) is an arbitrary zero mean random 
vector independent of w, consider the problem of minimizing the functional (|3]) over the class of all 
feedback laws vr that are deterministically well-posed for ([1). Then, provided it is deterministically well- 
posed, the unique optimal control law is given by ©, where K is defined by (fTTI) and x is the conditional 
mean 

Proof: Given Lemma [H we can use the same completion-of-squares argument as in Section |II] except 
that we now need to use Ito's differential rule for martingales (see, e.g., lfT9l , ll33l ). which, in integrated 
form, becomes 

x{T)'P{T)x{T) - x(0)'P(0)x(0) = /a + 

r ■ (44) 

+ / {x{t)'P{t)x{t)dt + 2x{t^yP{t)dx + tT{P{t)d[x,x'])}, 
Jo 

where [x, x'] is the quadratic variation of x and /a is an extra term which is in general nontrivial when 
w has a jump component. Now let 

q{t) := [ <^{t,s){A{s)x{s) + Bi{s)u{s))ds, 
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where $ is the transition function of (d) which is differentiable in both arguments. Then, x = q + v, where 
dv = B2dw and g is a continuous process with bounded variation. Therefore 

[x, x'] = [g, q'] + 2[g, v'] + [v, v'] = [v, v']. 

In fact, [g, g'] = [q,v'] = lfT9l Corollary 8.5]. Since v does not depend on the control u, neither does 
the last term in the integral in (|44] |. If w has a jump component, we have a nontrivial extra term in (|44] ). 
namely 

fA = Y^ [x{syP{s)x{s) - x{s^yP{s)x{s^) - 2x{s^yP{s)A, - A^P(s)A,] 

s<T 

where the sum is over all jump times s on the interval [0, T] and := x{s) — x(s_) is the jump, and we 
need to ensure that this term does not depend on the control either. However, since x{s) = + A^, 
we have /a = 0. 

Then the rest of the proof that d?) with x given by Q is the unique minimizer of Q over all 
deterministically well-posed control laws follows from an argument as in Section |lll More precisely, 
using (fTTI) and completing the squares we obtain 



x{tyQ{t)x{t)dt + / u{ty R{t)u{t)dt + x{Ty Sx{T) 
Jo 

= x(0)'P(0)x(0) + [ {u- KxyR{u- Kx)dt (45) 



+ / ti{P{t)d[v,v'])+ I x{t^yP{t)B2{t)dw. 



Next we show that | x{t _)' P {t) B2{t)dw j does not depend on the control in this more general case 
as well. Since only the second term in (fTJI ) depends on the control, the problem reduces to showing that 




s)Bi{s)u{s)ds 



P{t)B2{t)dw{t) } = 0. 



After a change in the order of integration this is equivalent to 

where F{t, s) = -Bi(t)'$(s, t)'P(s)i?2(s). However, since u{t) is Wrmeasurable, where Wt is the sigma- 
field generated by {w{s); < s < t}, can be written 

eS^J^ F{t,s)dw{s)\Wt'^dt 

which is zero since is a martingale. In view of ([17]) where ([TS]) does not depend on the control (Lemma 
[8]) the statement of the theorem follows. ■ 
We note that in general the optimal control law does not belong to C and that x is not given by the 
Kalman filter ([5]) but by the conditional mean which then has to be chosen with some care since 
it is only defined almost surely as projection for each individual time t. To this end it is standard to 
select the optional projection of x{t) on yt which is a stochastic process with a cadlag version [2. page 
17]. Often X is given by a nonlinear filter as in the following example. However, even in those cases, 
it is difficult to ascertain well-posedness. At present, we are unable to establish that the control law in 
the example is deterministically well-posed and hence optimal in our admissible class of controls. We 
conjecture that Theorem [14] can be strengthened by removing the a priori assumption of well-posedness 
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Fig. 6. Model for step change in white noise. 



for the case where the optimal filter can be expressed as a stochastic differential equation with locally 
Lipschitz coefficients by suitable use of stopping times. Such a strengthening would suffice to prove 
optimality for the following example where we are currently unable to prove well-posedness. 

Example 15: Consider the system in Figure |6l Here, x represents a parameter which undergoes a sudden 
random step change due to a random external forcing v. The step can be in either direction. Thus, as a 
stochastic process v{t) is defined 

«W = |* (47) 

where 9 = ±1 with equal probability and r is a random variable uniformly distributed on [0, T]. Clearly 
i; is a martingale. Our goal is to maintain a value for the state x close to zero on the interval [0,T] via 
integral control action through u, indirectly, by demanding that 

e \ [ {x^ + Ru^)dt 



be minimal with i? > 0. Here, u denotes the control. The process x is observed in additive white noise 
w. The system is now written in the standard form ([1} as follows: 

dx = u{t)dt + dv, x(0) = 0, ^^g^ 
dy = x{t)dt + adw 

where w is a Wiener process. We first solve the Riccati equation k = —k"^ + with boundary condition 
k{T) = to obtain k{t) = -R'^/'^ tanh [R'^''^{T - 1)). The control law in Theorem [His 

u{t) = k{t)x{t), (49a) 

where the conditional expectation is determined separately using a (nonlinear) Wonham-Shiryaev filter 

dx = k(t)x(t)dt + ^(l- p(t? -2(T -t)^(t))(dy-x(t)dt) (49b) 
a^ 

dp = ^{l - pitf - 2iT - tUmidy - x{t)dt) (49c) 
a^ 

d(p = — \<i){t)p{t){dy - x{t)dt) (49d) 



a^ 



with p(0) = and 0(0) = 1. Following |[T6l page 222] we explain the steps for deriving the filter equations 
in Appendix IVIIIi 

In order to conclude that the control law ([49] ) is actually optimal we need to establish that the feedback 
loop is deterministically well-posed. This requires that ([TOl) has a unique solution for each zq = {y w) . 
Noting that the inovation dy — xdt can be expressed as 

dy — x{t)dt = {v(t) — p{t))dt + dw, 

this requires that the stochastic differential equations (|48l) -([49l) can be uniquely solved path-wise as a map 
from Zq = {y w^ io z = [x . There are conditions in the literature for when path-wise uniqueness 
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holds (see [|34l page 126, Theorem 10.4], [fT9l page 128], and the references therein). However, we are 
not able at present to verify that these hold in our case. 

In view of Remark \TT\ we immediately have the following corollary to Theorem [14] for the case of 
complete state information. A similar statement was given in ll27l in a different context. 

Corollary 16: Given the system (fT4l) . where w is a martingale and x(0) is an arbitrary random vector 
independent of w, consider the problem of minimizing the functional (|3]l over the class of all feedback 
laws TT that are deterministically well-posed for (fT4l i. Then the unique optimal control law is given by 
(fT5l) . where K is defined by (fTTI) . 

Proof: It just remains to prove that the control law (fT5T ) is deterministically well-posed. To this end, 
we first note that (with z = x) the feedback equation (flOl ) becomes 

x{t) = xo(t) + / Q{t,s)x{s)ds, 







where Q{t, s) = s)Bi{s)K{s) with $ (as before) being the transition matrix function of A. Then a 
straight-forward calculation shows that 

x{t) = Xo{t) + / R{t, s)xo{s)ds, 



where R is the unique solution of the resolvent equation (1431) . This establishes well-posedness. ■ 
Example 17: Let the driving noise w in (fT4)) be given by either a Poisson martingale Vl9i page 87], or 
a geometric Brownian motion [[T9l page 124] 

dw = fiw{t)dt + aw{t)dv, 

where v is a Wiener process, or a combination. Then the control law u(t) = K{t)x{t) is optimal for the 
problem to minimize 

VI. The separation principle for delay-differential systems 

The formulation ([8]) covers more general stochastic systems than the ones considered above. An example 
is a delay-differential system of the type 

' dx = Ai{t)x{t)dt + A2it)x{t - h)dt + f^^^A^it, s)x{s)dsdt + Bi{t)u{t)dt + B2{t)dw 
dy = Ci{t)x{t)dt + C2{t)x{t - h)dt + D{t)dw 

Apparently, stochastic control for various versions of such systems were first studied in [|23l . [|24l . 
Il26l . [|27l . and BUl, although [9] relies on the strong assumption that the observation y is "functionally 
independent" of the control n, thus avoiding the key question studied in the present paper. 
Here, as in ll26l . we shall consider the wider class of stochastic systems 

//_^ dsAit, s)x{s)\ dt + Bi{t)u{t)dt + B2{t)dw 

iLh ^)^(^)) + D{t)dw 

where A and C are of bounded variation in the first argument and continuous on the right in the second, 
x{t) = ^(t) is deterministic (for simplicity) for —h < t < 0, and y{0) = 0. More precisely, A(t,s) = 
for s > t, A{t,s) = A(t,t — h) for t < t — h, and the total variation of s H- A(t,s) is bounded by 
an integrable function in the variable t, and the same holds for C. Moreover, to avoid technicalities we 
assume that w is now a (square-integrable) Gaussian (vector) martingale. Now, the first of equations (ISOl ) 
can be written in the form 

x(t) = $(t, 0)^(0)+ f dr{ [ ^t,s)A{s,T)ds]ar) 

-^-^ ^-^^ ^ (51) 

+ / ^{t,s)Bi{s)u{s)ds + / ^{t,s)B2{s)dw 
Jo Jo 
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||26l p. 85], where $ is the Green's function corresponding to the determinisitic system [|3l (also see, e.g., 
[|26l p. 101]). In the same way, we can express the second equation in integrated form. Consequently, 
(|50l ) can be written in the form ([8]), where K and H are computed as in [|26l pp. 101-103]. The problem 
is to find a feedback law Q that minimizes 

Jiu) := E{Vo{x,u)} (52) 

subject to the constraint (|50] ). where 

Vs{x,u):=l[ x{t)'Q{t)x{t)da{t)+ I u{t)' R{t)u{t)dt\ (53) 



and da is a positive Stieltjes measure. 

Lemma [8] enables us to strengthen the results in [|26l . To this end, to avoid technicalities, we shall 
appeal to a representation result from [|27l rather than using a completion-of-squares argument, although 
the latter strategy would lead to a stronger result where w could be an arbitrary martingale. A completion- 
of-squares argument for a considerably simpler problem was given in [8], but, as pointed out in ||28l , this 
paper suffers from a similar mistake as the one pointed out earlier on page |4] in the present paper. In this 
context, we also mention the recent paper [4], which considers optimal control of a stochastic system with 
delay in the control. This paper assumes at the outset that the separation principle for delay systems is 
valid with a reference to [|20l . Instead of basing the argument on [20], which is not quite appropriate here, 
their claim could be justified by noting that the delay in the control also implies a delay in information 
as in Example [3] above. 

Now, it can be shown that the corresponding deterministic control problem obtained by setting w = Q 
has an optimal linear feedback control law 

U{t)= [ drK{t,T)x{T), (54) 

where we refer the reader to [|26l for the computation of K. The following theorem is a considerable 
strengthening of the corresponding result in [26]. 

Theorem 18: Given the system (l50l) . where w is a Gaussian martingale, consider the problem of 
minimizing the functional (|52|) over the class of all feedback laws tt that are deterministically well-posed 
for ([T]). Then the unique optimal control law is given by 

u{t) = [ dsK{t,s)x{s\t), (55) 

where K is the deterministic control gain (|54] ) and 

x{s\t) := E{x{s) I yt} (56) 

is given by a linear (distributed) filter 

dx{t\t) = [ dsA{t,s)x{s\t)dt + Biudt + X{t,t)dv (57a) 

dtx{s\t) = X{s,t)dv, s < t (57b) 
where v is the innovation process 



dv = dy- dsCit, s)x{s\t)dt, v{fi) = 0, (58) 



and the gain X is as defined in [|26[ p. 120]. 

For the proof of Theorem [18] we shall need two lemmas. The first is a slight reformulation of Lemma 
4.1 in ETl and only requires that v be a martingale. 
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Lemma 19 f ll271 ): Let f be a square-integrable martingale with natural filtration 

Vt = (r{v{s),s e[0,t]}, 0<t<T (59) 

and satisfying = (3j5jk, where k = l,2,...,p, are nondecreasing functions, and Sj^ is the 

Kronecker delta equal to one for j = k and zero otherwise. With u a square-integrable control process 
adapted to {Vt}, let 



u(t) = u 



{t) + Y^ / s)dvk{s) + u{t) (60) 

be the unique orthogonal decomposition for which u is deterministic and, for each t E [0,T], u is 
orthogonal to the linear span of the components of {i;(s),s E [0,t]}. Moreover, let xq be a square- 
integrable process adapted to {V^} and having a corresponding orthogonal decomposition 

p j-t 

xoit)=xo{t) + J2 xl{t,s)dvkis)+xoit). (61) 
k=i 

Then x = Xq + g{u), defined by ([8]) exchanging z for x, has the orthogonal decomposition 

P rt 

x{t) = x{t) + V / Xk{t, s)dvk{s) + x(t), (62) 
Jo 



k=l 



where 



and 



x{t) = xo{t)+ [ G{t,T)u{T)dT (63a) 
Jo 

Xk{t,s) = xl{t,s)+ G{t,T)uk{T,s)dr (63b) 

J s 

x{t) = xo{t)+ / G{t,T)u{T)dT (63c) 
Jo 

P pT 

E{Vo{x,u)} = E{Voix,u) + y2 Vs{xk{;s),Uk{;s))d(3k + E{Vo{S:,u}). (64) 

k=i Jo 

For a proof of this lemma, we refer the reader to [ETI . 

Lemma 20: Let y be the output process of the closed-loop system obtained after applying a determin- 
istically well-posed feedback law u = 7r{y) to the system (|50l ). Then the innovation process (|58T ) is a 
Gaussian martingale, and the corresponding filtration (|59T l satisfies 

H = < t < T. (65) 



Proof: As can be seen from the equation (1511) and the remark following it, the process yo obtained 
by setting n = in (l50l) is given by dyo = qo{t)dt + D{t)dw for a process qo adapted to {Wt}. Define 
dvo = dyo — qQ{t)dt, where go(^) := E{qQ{t) \ yf}. Now, qo and w are jointly Gaussian, and therefore, 
for each t E [0,T], the components of qo{t) belong to the closed linear span of the components of the 
martingale {yo, t E [0,T]}, and hence 

qo{t) = [ M{t,s)dyo 



for some L^-kemel M. Therefore, Wq is Gaussian, and its natural filtration V° satisfies V° C y^. Now 
let R be the resolvent of the Volterra equation with kernel M; i.e., the unique solution of the resolvent 
equation 

R(t, = ^ t)M{t, s)dT + M{t, s) 
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m, ma. Then 



R{t,s)dvo{s) = J M{t,s)dyo{s) = qo{t), 

and hence yf C Vf . Consequently, in view of Lemma [8l V^*^ = yf = yt- Next observe that 

dy = q{t) + D{t)dw, q{t) := qo{t) + h{u){t), 
where h(u) is a causal (linear) function of the control u. Since h(u) is adapted to {yt}, 

q{t) ■.= qo{t) + h{u){t), 

and therefore the innovation process (|58T ) satisfies dv = dy — q{t)dt = dyo — qo{t)dt = dvQ. Equation (l65l) 
now follows. 

Finally, to prove that the innovation process f is a martingale we need to show that 

E{v{s) - v{t) \Vt} = for all s>t. 

To this end, first note that 

E{v{s)-v{t)\Vt} = E!. [ q{T)dr\Vt \ +e!. [ B{T)dw\Vt\, (66) 



where q{t) := q(t) — q(t). Since all the processes are jointly Gaussian (the control- dependent terms have 
been canceled in forming q), independence is the same as orthogonality. Since g(r) ± Vr D for r > t, 
the first term in (1661) is zero. The second term can be written 



which is zero since is a martingale. ■ 
We are now in a position to prove Theorem [18] Lemma |20] shows that the innovation process (l58l) is 
a martingale. It is no restriction to assume that E{v{t)v(ty} is diagonal; if it is not, we just normalize 
the innovation process by replacing v{t) by R{t)~^^'^v{t), where R{t) := E{v{t)v{t)'} > 0. Then we set 
(3k{t) := E{vl}, A; = 1, 2, . . . ,p. Since Vt = yt for t G [0,T] (Lemma [2Ql), admissible controls take the 
form (|60l) . Moreover, the process x{t) := E{x{t) \ yt} is adapted to {Vt}, and hence, analogously to 



(|60l) . it has the decomposition 

p pt 

x{t)=x{t) + ^ Xkit,s)dvkis)+x{t), (67) 

k=l ^0 

which now will take the place of ( |62l ) in Lemma [191 As before, let xq be the process x obtained by setting 
M = 0. By Lemma [HI xq does not depend on the control u. Moreover, since xq and v are jointly Gaussian, 

"t 

xo{t) = xo{t) + y I xl{t, s)dvk{s), (68) 



k=l "^0 



replacing (|6TI) in Lemma \T9\ Moreover, 

E{Vo{x, u)} = E{Voix, u)} + E{Vo{x - x, 0)}, 

where the last term does not depend on the control, since x — x = xq — xq. Hence, by Lemma [T9l the 
problem is now reduced to finding a control ([60] ) and a state process ([67] ) minimizing E{Vo{x, u)} subject 
to 

x{t) = xo{t)+ / G{t,T)u{T)dT (69a) 
Jo 

Xk{t,s) = xl{t,s)+ G{t,T)ukiT,s)dT (69b) 

J s 

x{t) = [ G{t,T)u{T)dT (69c) 
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where the last equation has been modified to account for the fact that Xq = 0. Clearly, this problem 
decomposes into several distinct problems. First u need to chosen so that Vo{x,u) is minimized subject 
to (|69a| ). This is a deterministic control problem with the feedback solution 

u{t) = f drK{t,T)x{T), (70) 

Jt-h 

where K is as in (|54|) . Secondly, for each s G [0, T] and /c = 1, 2, . . . Uk{t, s) has to be chosen so as 
to minimize Vs{xk{-, s), Uk{-, s)) subject to (|69b|) . This again is a deterministic control problem with the 
optimal feedback solution 

Uk{t,s)= / drK{t,T)Xk{T,s). (71) 
Jt~h 

Finally, u should be chosen so as to minimize E{Vo{x,u}) subject to (|69c| ). This problem clearly has the 
solution u = 0, and hence x = as well. Combining these results inserting them into (1601 ) then yields 
the optimal feedback control 



u{t) 



/ drK{t,T)[x{T) + Xk{t,s)dvk{s)) 

It-h Jo 

It remains to show that this is exactly the same as (|55] ); i.e., that 

x{T\t) = x{t) + Xkit,s)dvk{s). (72) 

fe=l 

To this end, first note that, since the optimal control is linear in dv, x{T\t) will take the form 

x(r|t) = x(r) + / Xt{T,s)dv{s), 



where x(r) = E{x{t)}, the same as in (|72l ). Clearly E{[x{t) — x(r|t)]i;(s)'} = for s G [0,t], and 
therefore 

E{x{T)v{sy} = E{x{r\t)v{sy} = [ Xt{T, s)d^{s), 



showing that the kemel Xt does not depend on t; hence this index will be dropped. Now, setting 
T = t, comparing with (|67] ) and noting that x = 0, we see that X{t, s) is the matrix with columns 
xi{t, s), X2{t, s), . . . , Xp{t, s), establishing (1721) . which from now we shall write 

x(r|t)=x(r)+ [ X{T,s)dv{s). (73) 
Jo 

Hence, (1551 ) is the optimal control, as claimed. Moreover, 

x{T\t)=x{s)+ [ X{T,s)dv{s), 



which yields (I57al) . To derive (|57bl ). follow the procedure in [|26ll . 

It remains to show that the optimal control law (|55T ) is deterministically well-posed. To this end, it is 
no restriction to assume that xq = so that all processes have zero mean. Then it follows from (1551) and 
the unsymmetric Fubini Theorem of Cameron and Martin [[TOl that 

u{t) = [ P{t,s)dv{s), where P{t,s)= [ drK{t,T)X{T, s)dT, 

Jo Jt~h 
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and likewise from (|58] ) that 

dv = dy — / S{t, s)dv{s)dt, where S{t,s)= / drC{t,T)X{T, s)dT. 

Jo Jt~h 

The function S* is a Volterra kernel and therefore the Volterra resolvent equation 



V{t,T)S{T,s)dT + S{t,s) 



has a unique solution V, from which it follows that 

dv = dy — / V{t, s)dy{s). 
Jo 

Then the optimal control law is given by (|40] ). where now M is given by 

M(t, s) = P{t, s)- Pit, r)V(r, s)dT. 

Now, for the optimal control law, s i— j- X(t, s) is of bounded variation for each t [|26l , and hence so is 
s I— J- M(t, s). Hence TTopt can be defined samplewise as in (|4T]) . To complete the proof that the optimal 
feedback loop is deterministically well-posed we proceed exactly as in the proof of Theorem [131 noting 
that in the present setting 

^(t.)- fd P^^-^) 



where $(t, s) is the transition matrix of A ll26l p. 101]. 

Remark 21: It was shown in [|27l that, in the case of complete state information {y = x), the control 
(|54|) is optimal even when w is an arbitrary (not necessarily Gaussian) martingale. 



VIL Conclusions 

In studying the literature on the separation principle of stochastic control, one encounters many ex- 
positions where subtle difficulties are overlooked and inadmissible shortcuts are taken. On the other 
hand, for most papers and monographs that provide rigorous derivations, one is struck by the level of 
mathematical sophistication and technical complexity, which makes the material hard to include in standard 
textbooks in a self-contained fashion. It is our hope that our use of deterministic well-posedness provides 
an altemative mechanism for understanding the separation principle that is more palatable and transparent 
to the engineering community, while still rigorous. The new insights offered by the approach allow us to 
establish the separation principle also for systems driven by non-Gaussian martingale noise. However, in 
this more general framework the key issue of establishing well-posedness for particular control systems 
is challenging and more work needs to be done. 



Acknowledgement 



We are indebted to an anonymous referee for significant input which has helped us improved the paper 
considerably. 



22 



VIII. Appendix 

Consider the "uncontrolled" observation process dyo = v{t)dt + adw. If dF denotes the law of (9, r, w) 
and A{t) = ^(.s)d.yo~2'^ ^ /o then, under a new measure dQ := A(T)~'^dF, uq becomes a Wiener 

process while the law of v (i.e., of 6 and r) is the same as before. Under dQ, the two processes and 
V are independent. The conditional expectation is now given by (Bayes' formula [|T6l page 174]) 



EQ{A{t)\yt) EQ{e''~' /o eh>rdyo~h-~^ is>.ds^y^^ 

_E;Q(e(j/oW-2/o(tAr)-|{t-T) + )/<72 _ g(-(j/o(t)-?/o(tAr))-|(t-r) + )/<72|-y^^ 



(74) 



Here t At := min(t, r), It>T{t) = 1 when t > r and otherwise, and (t — r)+ = {t — T)It^T-- Note that 
v{t) = 9It>T{t). For convenience we define p{t) := Ef{v{t)\yt) and 



S(t) := / (e?^oW-2^oW-|(t-s))/-^^5^ and 



From dill), p(t) = N{t)/D{t) where 

iV(t) = S(t)-S(t) 

D(t) = S(t) + S(t) + 2(T-t). 

By first noting that S and S satisfy the stochastic differential equations 

(is = T,(t)dyo + (it 
(is = — S(t)(i?/o + (it, 

respectively, the Ito rule applied to the expression N(t)/D(t) for the conditional expectation gives the 
filter equations (setting = D~^) 

dp = a~'^{l- p{tf -2{T -t)<p{t)){dyo- pdt) (75a) 
d(P = -a^'^<i){t)p{t){dyo{t) - p{t)dt). (75b) 

Finally, noting that the innovation dy^ — pdt is equal to dy — xdt for the controlled system, we obtain the 
filter equations (1491 ). 
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